Computer software may include automated processes you can use for fitting models. We
discourage you from using these in biostatistics because you want to have a lot of control over
how a model is being fitted to make it possible for you to interpret the results. However, these
processes can be used to create comparison models — or to simulate improved models — which
are perfectly reasonable methods to explore ways to improve your model.
Understanding Interaction (Effect Modification)
In Chapter 17, we touch on the topic of interaction (also known as effect modification). This is where
the relationship between an exposure and an outcome is strongly dependent upon the status of another
covariate. Imagine that you conducted a study of laborers who had been exposed to asbestos at work,
and you found that being exposed to asbestos at work was associated with three times the odds of
getting lung cancer compared to not being exposed. In another study, you found that individuals who
smoked cigarettes had twice the odds of getting lung cancer compared to those who did not smoke.
Knowing this, what would you predict are the odds of getting lung cancer for asbestos-exposed
workers who also smoke cigarettes, compared to workers who aren’t exposed to asbestos and do not
smoke cigarettes? Do you think it would be additive — meaning three times for asbestos plus two
times for smoking equals five times the odds? Or do you think it would be multiplicative — meaning
three times two equals six times the odds?
Although this is just an example, it turns out that in real life, the effect of being exposed to both
asbestos and cigarette smoking represents a greater than multiplicative synergistic interaction (meaning
much greater than six) in terms of the odds for getting lung cancer. In other words, the risk of getting
lung cancer for cigarette smokers is dependent upon their asbestos-exposure status, and the risk of lung
cancer for asbestos workers is dependent upon their cigarette-smoking status. Because the factors
work together to increase the risk, this is a synergistic interaction (with the opposite being an
antagonistic interaction).
How and when do you model an interaction in regression? Typically, you first fit your final model
using a multivariate regression approach (see the earlier section “Adjusting for confounders in
regression” for more on this). Next, once the final model is fit, you try to interact the exposure
covariate or covariates with a confounder that you believe is the other part of the interaction. After
that, you look at the p value on the interaction term and decide whether or not to keep the interaction.
Imagine making a model for the study of asbestos workers, cigarette smoking, and lung cancer. The
variable asbestos is coded 1 for workers exposed to asbestos and 0 for workers not exposed to
asbestos, and the variable smoker is coded 1 for cigarette smokers and 0 for nonsmokers. The final
model would already have asbestos and smoker in it, so the interaction model would add the
additional covariate asbestos × smoker, which is called the higher order interaction term. For
individuals who have a 0 for either asbestos or smoker or both, this term falls out of their individual
predicted probability (because 1 × 0 = 0, and 0 × 0 = 0). Therefore, if this term is statistically
significant, then individuals who qualify to include this term in their individual predicted probability
have a statistically significantly greater risk of the outcome, and the interaction term should be kept in
the model.